AITopics | evaluation protocol

Collaborating Authors

evaluation protocol

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Multi-Domain Learning for Generalizable Video Anomaly Detection

Neural Information Processing SystemsMar-20-2026, 19:10:00 GMT

Most of the existing Video Anomaly Detection (VAD) studies have been conducted within single-domain learning, where training and evaluation are performed on a single dataset. However, the criteria for abnormal events differ across VAD datasets, making it problematic to apply a single-domain model to other domains. In this paper, we propose a new task called Multi-Domain learning forVAD (MDVAD) to explore various real-world abnormal events using multiple datasets for a general model. MDVAD involves training on datasets from multiple domains simultaneously, and we experimentally observe that Abnormal Conflicts between domains hinder learning and generalization. The task aims to address two key objectives: (i) better distinguishing between general normal and abnormal events across multiple domains, and (ii) being aware of ambiguous abnormal conflicts. This paper is the first to tackle abnormal conflict issue and introduces a new benchmark, baselines, and evaluation protocols for MDVAD. As baselines, we propose a framework with Null(Angular)-Multiple Instance Learning and an Abnormal Conflict classifier. Through experiments on a MDVAD benchmark composed of six VAD datasets and using four different evaluation protocols, we reveal abnormal conflicts and demonstrate that the proposed baseline effectively handles these conflicts, showing robustness and adaptability across multiple domains.

artificial intelligence, data mining, proceedings, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.66)
Information Technology > Artificial Intelligence (0.63)

Add feedback

Revisiting OmniAnomaly for Anomaly Detection: performance metrics and comparison with PCA-based models

Alves, Bruna, Martins, Ana, Pinho, Armando J., Gouveia, Sónia

arXiv.org Machine LearningMar-20-2026

Deep learning models have become the dominant approach for multivariate time series anomaly detection (MTSAD), often reporting substantial performance improvements over classical statistical methods. However, these gains are frequently evaluated under heterogeneous thresholding strategies and evaluation protocols, making fair comparisons difficult. This work revisits OmniAnomaly, a widely used stochastic recurrent model for MTSAD, and systematically compares it with a simple linear baseline based on Principal Component Analysis (PCA) on the Server Machine Dataset (SMD). Both methods are evaluated under identical thresholding and evaluation procedures, with experiments repeated across 100 runs for each of the 28 machines in the dataset. Performance is evaluated using Precision, Recall and F1-score at point-level, with and without point-adjustment, and under different aggregation strategies across machines and runs, with the corresponding standard deviations also reported. The results show large variability across machines and show that PCA can achieve performance comparable to OmniAnomaly, and even outperform it when point-adjustment is not applied. These findings question the added value of more complex architectures under current benchmarking practices and highlight the critical role of evaluation methodology in MTSAD research.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2603.18985

Country: Europe > Portugal > Aveiro > Aveiro (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

A Hitchhiker's Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning

Neural Information Processing SystemsMar-17-2026, 20:30:07 GMT

Explainability in artificial intelligence is crucial for restoring trust, particularly in areas like face forgery detection, where viewers often struggle to distinguish between real and fabricated content. Vision and Large Language Models (VLLM) bridge computer vision and natural language, offering numerous applications driven by strong common-sense reasoning. Despite their success in various tasks, the potential of vision and language remains underexplored in face forgery detection, where they hold promise for enhancing explainability by leveraging the intrinsic reasoning capabilities of language to analyse fine-grained manipulation areas. For that reason, few works have recently started to frame the problem of deepfake detection as a Visual Question Answering (VQA) task, nevertheless omitting the realistic and informative open-ended multi-label setting. With the rapid advances in the field of VLLM, an exponential rise of investigations in that direction is expected. As such, there is a need for a clear experimental methodology that converts face forgery detection to a Visual Question Answering (VQA) task to systematically and fairly evaluate different VLLM architectures. Previous evaluation studies in deepfake detection have mostly focused on the simpler binary task, overlooking evaluation protocols for multi-label fine-grained detection and text-generative models. We propose a multi-staged approach that diverges from the traditional binary evaluation protocol and conducts a comprehensive evaluation study to compare the capabilities of several VLLMs in this context. In the first stage, we assess the models' performance on the binary task and their sensitivity to given instructions using several prompts.

artificial intelligence, detection, natural language, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.62)

Add feedback

Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model

Neural Information Processing SystemsMar-17-2026, 16:46:40 GMT

With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the Convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that deep learning models have a huge potential for solving the problem. However, the convolutional recurrence structure in ConvLSTM-based models is location-invariant while natural motion and transformation (e.g., rotation) are location-variant in general. Furthermore, since deep-learning-based precipitation nowcasting is a newly emerging area, clear evaluation protocols have not yet been established. To address these problems, we propose both a new model and a benchmark for precipitation nowcasting. Specifically, we go beyond ConvLSTM and propose the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections. Besides, we provide a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory, a new training loss, and a comprehensive evaluation protocol to facilitate future research and gauge the state of the art.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Country: Asia > China > Hong Kong (0.27)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality

Neural Information Processing SystemsFeb-16-2026, 18:23:20 GMT

Despite these strides, evaluating these models poses substantial challenges.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Hong Kong (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.68)
(2 more...)

Add feedback

89e44582fd28ddfea1ea4dcb0ebbf4b0-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-15-2026, 18:07:44 GMT

large language model, machine learning, natural language, (24 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.13)
North America > Canada > Ontario > Toronto (0.13)
(44 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
(2 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Leisure & Entertainment (1.00)
(24 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
(6 more...)

Add feedback

92af93f73faf3cefc129b6bc55a748a9-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 23:03:51 GMT

detection, feature map, ranking loss, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.77)

Add feedback

ACloserLookatWeakly-Supervised Audio-VisualSourceLocalization

Neural Information Processing SystemsFeb-12-2026, 20:48:23 GMT

Second, current evaluation metrics assume thepresence ofsound sources at alltimes.

artificial intelligence, inproceeding, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Zhejiang Province > Hangzhou (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

d75f561eaaf2cb754bc8d7e36d8af362-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 04:50:12 GMT

dataset, generalization, graph, (12 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CharacterizingGeneralizationunder Out-Of-DistributionShiftsinDeepMetricLearning

Neural Information Processing SystemsFeb-11-2026, 07:14:03 GMT

However, common evaluation protocols only test a single, fixed data split in which train and test classes are assigned randomly. More realistic evaluations should consider abroad spectrum of distribution shifts with potentially varying degree and difficulty. In this work, we systematically construct train-test splits of increasing difficulty and present the ooDML benchmark to characterize generalization underout-of-distribution shifts inDML.ooDMLis

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback